Hyperparameter Tuning with Optuna¶
With great models, comes the great problem of optimizing hyperparameters [Tha20]. Once a good search algorithm is established for hyperparameter optimization, the task becomes an engineering problem 1. Hence, we will explore an open-source library that offers a framework for solving this task.
Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. It features an imperative, define-by-run style user API. Thanks to our define-by-run API, the code written with Optuna enjoys high modularity, and the user of Optuna can dynamically construct the search spaces for the hyperparameters.
Basics with scikit-learn¶
Optuna is a black-box optimizer, which means it only needs an objective function, which is any function that returns a numerical value, to evaluate the performance of the its parameters, and decide where to sample in upcoming trials. An optimization problem is framed in the Optuna API using two basic concepts: study and trial.
A study is conceptually an optimization based on an objective function, while a trial is a single execution of an objective function. The combination of hyperparameters for each trial is sampled according to some sampling algorithm defined by the study.
In the following code example, the search space is constructed within imperative Python code, e.g. inside conditionals or loops. On the other hand, recall that for GridSearchCV and RandomSearchCV in scikit-learn, we had to define the entire search space before running the search algorithm.
!pip install optuna
import optuna
import pandas as pd
from sklearn import ensemble, svm
from sklearn import datasets
from sklearn import model_selection
from functools import partial
import joblib
# [1] Define an objective function to be maximized.
def objective(trial, X, y):
# [2] Suggest values for the hyperparameters using trial object.
clf_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest'])
if clf_name == 'SVC':
svc_c = trial.suggest_loguniform('svc_c', 1e-10, 1e10)
clf = svm.SVC(C=svc_c, gamma='auto')
else:
rf_max_depth = int(trial.suggest_loguniform('rf_max_depth', 2, 32))
clf = ensemble.RandomForestClassifier(max_depth=rf_max_depth, n_estimators=10)
score = model_selection.cross_val_score(clf, X, y, n_jobs=-1, cv=5)
return score.mean()
# [3] Create a study object and optimize the objective function.
X, y = datasets.load_breast_cancer(return_X_y=True)
study = optuna.create_study(direction="maximize")
study.optimize(partial(objective, X=X, y=y), n_trials=5)
Collecting optuna
Using cached optuna-2.9.1-py3-none-any.whl (302 kB)
Requirement already satisfied: scipy!=1.4.0 in /usr/local/lib/python3.7/dist-packages (from optuna) (1.4.1)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from optuna) (21.0)
Collecting cmaes>=0.8.2
Using cached cmaes-0.8.2-py3-none-any.whl (15 kB)
Collecting cliff
Using cached cliff-3.9.0-py3-none-any.whl (80 kB)
Collecting alembic
Using cached alembic-1.7.3-py3-none-any.whl (208 kB)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from optuna) (1.19.5)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from optuna) (4.62.2)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from optuna) (3.13)
Collecting colorlog
Using cached colorlog-6.4.1-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: sqlalchemy>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from optuna) (1.4.23)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->optuna) (2.4.7)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.7/dist-packages (from sqlalchemy>=1.1.0->optuna) (1.1.1)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from sqlalchemy>=1.1.0->optuna) (4.8.1)
Collecting Mako
Using cached Mako-1.1.5-py2.py3-none-any.whl (75 kB)
Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from alembic->optuna) (5.2.2)
Collecting cmd2>=1.0.0
Using cached cmd2-2.2.0-py3-none-any.whl (144 kB)
Collecting stevedore>=2.0.1
Using cached stevedore-3.4.0-py3-none-any.whl (49 kB)
Collecting autopage>=0.4.0
Using cached autopage-0.4.0-py3-none-any.whl (20 kB)
Requirement already satisfied: PrettyTable>=0.7.2 in /usr/local/lib/python3.7/dist-packages (from cliff->optuna) (2.2.0)
Collecting pbr!=2.1.0,>=2.0.0
Using cached pbr-5.6.0-py2.py3-none-any.whl (111 kB)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from cmd2>=1.0.0->cliff->optuna) (3.7.4.3)
Collecting pyperclip>=1.6
Using cached pyperclip-1.8.2.tar.gz (20 kB)
Requirement already satisfied: attrs>=16.3.0 in /usr/local/lib/python3.7/dist-packages (from cmd2>=1.0.0->cliff->optuna) (21.2.0)
Collecting colorama>=0.3.7
Using cached colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: wcwidth>=0.1.7 in /usr/local/lib/python3.7/dist-packages (from cmd2>=1.0.0->cliff->optuna) (0.2.5)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->sqlalchemy>=1.1.0->optuna) (3.5.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in /usr/local/lib/python3.7/dist-packages (from Mako->alembic->optuna) (2.0.1)
Building wheels for collected packages: pyperclip
Building wheel for pyperclip (setup.py) ... ?25l?25hdone
Created wheel for pyperclip: filename=pyperclip-1.8.2-py3-none-any.whl size=11136 sha256=c24c993a79f5a247e9f966c493c9b358aa0dcd430b4c0e0666528bb6339ba109
Stored in directory: /root/.cache/pip/wheels/9f/18/84/8f69f8b08169c7bae2dde6bd7daf0c19fca8c8e500ee620a28
Successfully built pyperclip
Installing collected packages: pyperclip, pbr, colorama, stevedore, Mako, cmd2, autopage, colorlog, cmaes, cliff, alembic, optuna
Successfully installed Mako-1.1.5 alembic-1.7.3 autopage-0.4.0 cliff-3.9.0 cmaes-0.8.2 cmd2-2.2.0 colorama-0.4.4 colorlog-6.4.1 optuna-2.9.1 pbr-5.6.0 pyperclip-1.8.2 stevedore-3.4.0
[I 2021-09-23 10:37:22,628] A new study created in memory with name: no-name-cf61fd1f-c292-4d08-8331-b61df2b285b5
[I 2021-09-23 10:37:23,768] Trial 0 finished with value: 0.9402732494954199 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 2.3445281943268284}. Best is trial 0 with value: 0.9402732494954199.
[I 2021-09-23 10:37:23,896] Trial 1 finished with value: 0.9718987734823784 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 7.197952463169158}. Best is trial 1 with value: 0.9718987734823784.
[I 2021-09-23 10:37:23,989] Trial 2 finished with value: 0.6274181027790716 and parameters: {'classifier': 'SVC', 'svc_c': 0.051722222321909525}. Best is trial 1 with value: 0.9718987734823784.
[I 2021-09-23 10:37:24,061] Trial 3 finished with value: 0.6274181027790716 and parameters: {'classifier': 'SVC', 'svc_c': 9.41442891449573e-09}. Best is trial 1 with value: 0.9718987734823784.
[I 2021-09-23 10:37:24,147] Trial 4 finished with value: 0.6274181027790716 and parameters: {'classifier': 'SVC', 'svc_c': 0.0011946516826625228}. Best is trial 1 with value: 0.9718987734823784.
The study object saves the result of evaluating the objective each trial — which is essentially some choice of hyperparameters to evaluate. In the above study, the problem of model selection is framed as a hyperparameter optimization problem. Here we choose between an SVM-based algorithm or Random Forest.
study.trials_dataframe().head()
| number | value | datetime_start | datetime_complete | duration | params_classifier | params_rf_max_depth | params_svc_c | state | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.940273 | 2021-09-23 10:37:22.632861 | 2021-09-23 10:37:23.767644 | 0 days 00:00:01.134783 | RandomForest | 2.344528 | NaN | COMPLETE |
| 1 | 1 | 0.971899 | 2021-09-23 10:37:23.770200 | 2021-09-23 10:37:23.896625 | 0 days 00:00:00.126425 | RandomForest | 7.197952 | NaN | COMPLETE |
| 2 | 2 | 0.627418 | 2021-09-23 10:37:23.900134 | 2021-09-23 10:37:23.989031 | 0 days 00:00:00.088897 | SVC | NaN | 5.172222e-02 | COMPLETE |
| 3 | 3 | 0.627418 | 2021-09-23 10:37:23.990912 | 2021-09-23 10:37:24.060850 | 0 days 00:00:00.069938 | SVC | NaN | 9.414429e-09 | COMPLETE |
| 4 | 4 | 0.627418 | 2021-09-23 10:37:24.062551 | 2021-09-23 10:37:24.147696 | 0 days 00:00:00.085145 | SVC | NaN | 1.194652e-03 | COMPLETE |
Fine tuning Random Forest¶
Here we focus on tuning a single Random Forest model. Then, plot the accuracy for each pair of hyperparameters.
def objective(trial):
max_depth = trial.suggest_int('max_depth', 2, 128, log=True)
max_features = trial.suggest_float('max_features', 0.1, 1.0)
n_estimators = trial.suggest_int('n_estimators', 100, 800)
clf = ensemble.RandomForestClassifier(
max_depth=max_depth,
n_estimators=n_estimators,
max_features=max_features,
random_state=42)
score = model_selection.cross_val_score(clf, X, y, n_jobs=-1, cv=5)
return score.mean()
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=60)
[I 2021-09-23 06:42:48,998] A new study created in memory with name: no-name-21890d05-2176-4274-8c50-b293d30f31e3
[I 2021-09-23 06:42:51,527] Trial 0 finished with value: 0.9543393882937432 and parameters: {'max_depth': 3, 'max_features': 0.9705595686843378, 'n_estimators': 100}. Best is trial 0 with value: 0.9543393882937432.
[I 2021-09-23 06:42:55,482] Trial 1 finished with value: 0.9596180717279925 and parameters: {'max_depth': 13, 'max_features': 0.46775203542826993, 'n_estimators': 246}. Best is trial 1 with value: 0.9596180717279925.
[I 2021-09-23 06:43:00,795] Trial 2 finished with value: 0.9578792113025927 and parameters: {'max_depth': 23, 'max_features': 0.9062990961333452, 'n_estimators': 155}. Best is trial 1 with value: 0.9596180717279925.
[I 2021-09-23 06:43:15,038] Trial 3 finished with value: 0.95960254618848 and parameters: {'max_depth': 5, 'max_features': 0.2578188083038355, 'n_estimators': 749}. Best is trial 1 with value: 0.9596180717279925.
[I 2021-09-23 06:43:30,242] Trial 4 finished with value: 0.9596180717279925 and parameters: {'max_depth': 37, 'max_features': 0.749490296122596, 'n_estimators': 602}. Best is trial 1 with value: 0.9596180717279925.
[I 2021-09-23 06:43:36,503] Trial 5 finished with value: 0.9613879832324173 and parameters: {'max_depth': 22, 'max_features': 0.6506965527914478, 'n_estimators': 416}. Best is trial 5 with value: 0.9613879832324173.
[I 2021-09-23 06:43:43,341] Trial 6 finished with value: 0.9525694767893185 and parameters: {'max_depth': 3, 'max_features': 0.1995450947329181, 'n_estimators': 794}. Best is trial 5 with value: 0.9613879832324173.
[I 2021-09-23 06:43:51,923] Trial 7 finished with value: 0.9596180717279925 and parameters: {'max_depth': 103, 'max_features': 0.7122053899029194, 'n_estimators': 544}. Best is trial 5 with value: 0.9613879832324173.
[I 2021-09-23 06:43:58,840] Trial 8 finished with value: 0.9631113181183046 and parameters: {'max_depth': 54, 'max_features': 0.19194668972744688, 'n_estimators': 731}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:01,250] Trial 9 finished with value: 0.9613724576929048 and parameters: {'max_depth': 16, 'max_features': 0.4894931002552039, 'n_estimators': 179}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:04,481] Trial 10 finished with value: 0.95960254618848 and parameters: {'max_depth': 95, 'max_features': 0.11170498668801267, 'n_estimators': 372}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:09,013] Trial 11 finished with value: 0.9613569321533924 and parameters: {'max_depth': 45, 'max_features': 0.36449069283406643, 'n_estimators': 400}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:19,093] Trial 12 finished with value: 0.9596180717279925 and parameters: {'max_depth': 8, 'max_features': 0.6693668460810014, 'n_estimators': 610}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:25,745] Trial 13 finished with value: 0.9596180717279925 and parameters: {'max_depth': 46, 'max_features': 0.5815778613171878, 'n_estimators': 472}. Best is trial 8 with value: 0.9631113181183046.
[I 2021-09-23 06:44:29,124] Trial 14 finished with value: 0.9631423691973297 and parameters: {'max_depth': 26, 'max_features': 0.33234183683555163, 'n_estimators': 308}. Best is trial 14 with value: 0.9631423691973297.
[I 2021-09-23 06:44:32,064] Trial 15 finished with value: 0.95960254618848 and parameters: {'max_depth': 70, 'max_features': 0.34448905500108007, 'n_estimators': 261}. Best is trial 14 with value: 0.9631423691973297.
[I 2021-09-23 06:44:37,983] Trial 16 finished with value: 0.9596025461884802 and parameters: {'max_depth': 33, 'max_features': 0.1084223731260328, 'n_estimators': 686}. Best is trial 14 with value: 0.9631423691973297.
[I 2021-09-23 06:44:41,886] Trial 17 finished with value: 0.9613569321533924 and parameters: {'max_depth': 9, 'max_features': 0.3451621038253865, 'n_estimators': 343}. Best is trial 14 with value: 0.9631423691973297.
[I 2021-09-23 06:44:47,055] Trial 18 finished with value: 0.9631268436578171 and parameters: {'max_depth': 68, 'max_features': 0.23707504951615715, 'n_estimators': 510}. Best is trial 14 with value: 0.9631423691973297.
[I 2021-09-23 06:44:53,167] Trial 19 finished with value: 0.9648812296227295 and parameters: {'max_depth': 123, 'max_features': 0.4284868695999386, 'n_estimators': 501}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:44:56,155] Trial 20 finished with value: 0.9490607048594939 and parameters: {'max_depth': 2, 'max_features': 0.4858918861606428, 'n_estimators': 303}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:01,968] Trial 21 finished with value: 0.9613879832324173 and parameters: {'max_depth': 115, 'max_features': 0.39386785749654124, 'n_estimators': 495}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:07,214] Trial 22 finished with value: 0.9613724576929048 and parameters: {'max_depth': 68, 'max_features': 0.22246748754835521, 'n_estimators': 541}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:12,733] Trial 23 finished with value: 0.95960254618848 and parameters: {'max_depth': 121, 'max_features': 0.27140166339787225, 'n_estimators': 465}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:19,591] Trial 24 finished with value: 0.95960254618848 and parameters: {'max_depth': 73, 'max_features': 0.28521774043551834, 'n_estimators': 595}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:23,744] Trial 25 finished with value: 0.9613724576929048 and parameters: {'max_depth': 31, 'max_features': 0.41460082147718447, 'n_estimators': 346}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:31,008] Trial 26 finished with value: 0.9596180717279925 and parameters: {'max_depth': 23, 'max_features': 0.5486258028858528, 'n_estimators': 534}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:36,144] Trial 27 finished with value: 0.9578481602235677 and parameters: {'max_depth': 80, 'max_features': 0.43353643010332477, 'n_estimators': 413}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:39,317] Trial 28 finished with value: 0.9613724576929048 and parameters: {'max_depth': 56, 'max_features': 0.3148553844697188, 'n_estimators': 285}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:40,902] Trial 29 finished with value: 0.9631268436578171 and parameters: {'max_depth': 15, 'max_features': 0.16635263953427581, 'n_estimators': 174}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:49,782] Trial 30 finished with value: 0.9596180717279925 and parameters: {'max_depth': 89, 'max_features': 0.554596584580045, 'n_estimators': 658}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:50,958] Trial 31 finished with value: 0.9543393882937432 and parameters: {'max_depth': 10, 'max_features': 0.18035825668984193, 'n_estimators': 124}. Best is trial 19 with value: 0.9648812296227295.
[I 2021-09-23 06:45:52,746] Trial 32 finished with value: 0.9666356155876418 and parameters: {'max_depth': 16, 'max_features': 0.14768644779668522, 'n_estimators': 192}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:45:55,005] Trial 33 finished with value: 0.9648812296227295 and parameters: {'max_depth': 28, 'max_features': 0.2510243346455137, 'n_estimators': 219}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:45:59,103] Trial 34 finished with value: 0.9596491228070174 and parameters: {'max_depth': 19, 'max_features': 0.9745896683213247, 'n_estimators': 217}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:00,268] Trial 35 finished with value: 0.9613569321533924 and parameters: {'max_depth': 12, 'max_features': 0.30501879050629477, 'n_estimators': 103}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:02,213] Trial 36 finished with value: 0.9631113181183046 and parameters: {'max_depth': 7, 'max_features': 0.15632147396180562, 'n_estimators': 219}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:05,875] Trial 37 finished with value: 0.9578636857630801 and parameters: {'max_depth': 6, 'max_features': 0.8281693351932028, 'n_estimators': 220}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:09,494] Trial 38 finished with value: 0.9613414066138798 and parameters: {'max_depth': 4, 'max_features': 0.44095789977754735, 'n_estimators': 313}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:11,264] Trial 39 finished with value: 0.9648967551622418 and parameters: {'max_depth': 28, 'max_features': 0.37846712295417323, 'n_estimators': 151}. Best is trial 32 with value: 0.9666356155876418.
[I 2021-09-23 06:46:12,765] Trial 40 finished with value: 0.9666511411271541 and parameters: {'max_depth': 12, 'max_features': 0.24342019325477898, 'n_estimators': 144}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:14,236] Trial 41 finished with value: 0.9666511411271541 and parameters: {'max_depth': 18, 'max_features': 0.25686751699207633, 'n_estimators': 142}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:15,732] Trial 42 finished with value: 0.9631268436578171 and parameters: {'max_depth': 18, 'max_features': 0.25601226916584446, 'n_estimators': 149}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:16,925] Trial 43 finished with value: 0.9560937742586555 and parameters: {'max_depth': 12, 'max_features': 0.13252924130363736, 'n_estimators': 139}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:19,054] Trial 44 finished with value: 0.9613879832324173 and parameters: {'max_depth': 20, 'max_features': 0.3926218691035751, 'n_estimators': 182}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:20,062] Trial 45 finished with value: 0.9596180717279925 and parameters: {'max_depth': 27, 'max_features': 0.20070379523636273, 'n_estimators': 103}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:21,928] Trial 46 finished with value: 0.9648812296227295 and parameters: {'max_depth': 39, 'max_features': 0.2262102823158007, 'n_estimators': 192}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:23,681] Trial 47 finished with value: 0.9648812296227295 and parameters: {'max_depth': 42, 'max_features': 0.21154640307009018, 'n_estimators': 179}. Best is trial 40 with value: 0.9666511411271541.
[I 2021-09-23 06:46:27,221] Trial 48 finished with value: 0.968421052631579 and parameters: {'max_depth': 13, 'max_features': 0.5150620025995097, 'n_estimators': 268}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:30,613] Trial 49 finished with value: 0.968421052631579 and parameters: {'max_depth': 13, 'max_features': 0.5331741027433536, 'n_estimators': 263}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:34,135] Trial 50 finished with value: 0.9578792113025927 and parameters: {'max_depth': 13, 'max_features': 0.6088892609241104, 'n_estimators': 249}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:37,691] Trial 51 finished with value: 0.9666511411271541 and parameters: {'max_depth': 10, 'max_features': 0.5005211447500874, 'n_estimators': 274}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:41,115] Trial 52 finished with value: 0.9666511411271541 and parameters: {'max_depth': 11, 'max_features': 0.5270075737803348, 'n_estimators': 256}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:44,625] Trial 53 finished with value: 0.968421052631579 and parameters: {'max_depth': 11, 'max_features': 0.5327507689968285, 'n_estimators': 264}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:49,679] Trial 54 finished with value: 0.9613879832324173 and parameters: {'max_depth': 7, 'max_features': 0.6475633789841244, 'n_estimators': 346}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:53,394] Trial 55 finished with value: 0.9613879832324173 and parameters: {'max_depth': 10, 'max_features': 0.5337701892828655, 'n_estimators': 271}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:46:57,068] Trial 56 finished with value: 0.9631113181183046 and parameters: {'max_depth': 5, 'max_features': 0.5162466114601653, 'n_estimators': 288}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:47:03,202] Trial 57 finished with value: 0.9613879832324173 and parameters: {'max_depth': 9, 'max_features': 0.7434622439009879, 'n_estimators': 384}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:47:07,799] Trial 58 finished with value: 0.9631268436578171 and parameters: {'max_depth': 14, 'max_features': 0.5941045922128059, 'n_estimators': 332}. Best is trial 48 with value: 0.968421052631579.
[I 2021-09-23 06:47:11,341] Trial 59 finished with value: 0.9631578947368421 and parameters: {'max_depth': 8, 'max_features': 0.6420539195421321, 'n_estimators': 239}. Best is trial 48 with value: 0.968421052631579.
study.best_params
{'max_depth': 13, 'max_features': 0.5150620025995097, 'n_estimators': 268}
study.best_value
0.968421052631579
Sampling algorithms¶
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=1, ncols=3)
def plot_results(study, p1, p2, j, cb):
study.trials_dataframe().plot(
kind='scatter', ax=axes[j], x=p1, y=p2,
c='value', s=60, cmap=plt.get_cmap("jet"),
colorbar=cb, label="accuracy", figsize=(16, 4)
)
plot_results(study, 'params_max_depth', 'params_n_estimators', j=0, cb=False)
plot_results(study, 'params_max_depth', 'params_max_features', j=1, cb=False)
plot_results(study, 'params_n_estimators', 'params_max_features', j=2, cb=True);
Figure. TPE in action. Optuna uses Tree-structured Parzen Estimater (TPE) [BBBK11] as the default sampler which is a form of Bayesian optimization. Observe that the hyperparameter space is searched more efficiently than a random search with the sampler choosing points closer to previous good results. Samplers are specified when creating a study:
study = create_study(direction="maximize", sampler=optuna.samplers.TPESampler())
From the docs:
On each trial, for each parameter, TPE fits one Gaussian Mixture Model (GMM)
l(x)to the set of parameter values associated with the best objective values, and another GMMg(x)to the remaining parameter values. It chooses the parameter valuexthat maximizes the ratiol(x)/g(x).
Thus, TPE samples every hyperparameter independently — no explicit hyperparameter interactions are considered when sampling future trials, although other parameters implicitly affect objective value. Optuna also implements old friends random and grid search in the following samplers:
optuna.samplers.GridSampleroptuna.samplers.RandomSampler
Results from the paper [ASY+19]:
TPE+CMA-ES sampling can be implemented as follows:
sampler = optuna.samplers.CmaEsSampler(
warn_independent_sampling=False,
independent_sampler=optuna.samplers.TPESampler()
)
This uses the CMA-ES algorithm [Han16] with TPE for searching dynamically constructed hyperparameters (as CMA-ES requires that parameters are specified prior to the optimization).
Visualizations¶
First define a helper function for displaying plotly plots as HTML.
from IPython.core.display import display, HTML
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
config={'showLink': False, 'displayModeBar': False}
fig_count = 0
# See https://github.com/executablebooks/jupyter-book/issues/93 <!>
# Solves issue of having blank plotly plots in the build. No need to
# save the generated HTML files. Probably embedded into the notebook.
def plot_html(fig):
global fig_count
plot(fig, filename=f'optuna-{fig_count}.html', config=config)
display(HTML(f'optuna-{fig_count}.html'))
fig_count += 1
Optuna provides visualization functions in the optuna.visualization library 2. The following plot shows the best objective value found as the trials progress. The increasing trend in accuracy indicates that the TPE sampler is working well, i.e. the search algorithm learns from previous trials.
plot_html(optuna.visualization.plot_optimization_history(study))
The parallel coordinate plot gives us a feel of how the hyperparameters interact. For instance, max_features around 0.5 with n_estimators around 280 and max_depth around 20 generally perform well. This setting includes the best performing hyperparameters. To isolate subsets of lines, use the interactive capabilities of the plot below by dragging on each axis to restrict it. See figure immediately below.
plot_html(optuna.visualization.plot_parallel_coordinate(study))
Using sliders to restrict values for certain parameters.¶
Slice plots project the path of the optimizer in the hyperparameter space on each dimension, then shift along the \(y\)-axis according on its objective value. A large spread of dark dots indicate that a large range of values of that hyperparameter is feasible even at later stages. Meanwhile, a small spread means that the sampler focuses on a small part of the search space — in this case, other hyperparameters implicitly improve the objective. For example, the parameter max_features is explored at a wide range even at later trials. Hence, we think of this feature as important. Indeed, the importance plot below supports this.
plot_html(optuna.visualization.plot_slice(study, params=['n_estimators', 'max_depth', 'max_features']))
By default, the hyperparameter importance evaluator in Optuna is optuna.importance.FanovaImportanceEvaluator. This takes as input performance data gathered with different hyperparameter settings of the algorithm, fits a random forest to capture the relationship between hyperparameters and performance, and then applies functional ANOVA to assess how important each of the hyperparameters and each low-order interaction of hyperparameters is to performance [HHLB14]. From the docs:
The performance of fANOVA depends on the prediction performance of the underlying random forest model. In order to obtain high prediction performance, it is necessary to cover a wide range of the hyperparameter search space. It is recommended to use an exploration-oriented sampler such as
RandomSampler.
fig = optuna.visualization.plot_param_importances(study)
fig.update_layout(width=600, height=350)
plot_html(fig)
To visualize interactions of any pair of hyperparameters, we use contour plots. The contour plots indicate regions of high and low objective value.
fig = optuna.visualization.plot_contour(study, params=["max_depth", "max_features"])
fig.update_layout(width=550, height=500)
plot_html(fig)
Neural networks¶
As noted above, we should always perform tuning within a cross-validation framework. However, with neural networks, doing 5-fold CV would require too much compute time — hence, too much resources, e.g. GPU usage. Instead, we perform tuning on a hold-out validation set and hope for the best.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import Dataset, DataLoader
from sklearn import model_selection
from sklearn.datasets import fetch_openml
from tqdm import tqdm
import optuna
import numpy as np
Define a simple network.
class MLPClassifier(nn.Module):
"""
Neural network with multiple hidden fully-connected layers with ReLU
activation and dropout.
"""
def __init__(self, input_size, num_classes, n_layers, out_features, drop_rate):
super().__init__()
layers = []
in_features = input_size
for i in range(n_layers):
m = nn.Linear(in_features, out_features[i])
nn.init.kaiming_normal_(m.weight)
nn.init.constant_(m.bias, 0)
layers.append(m)
layers.append(nn.ReLU())
layers.append(nn.Dropout(drop_rate))
in_features = out_features[i]
layers.append(nn.Linear(in_features, num_classes))
self.net = nn.Sequential(*layers)
def forward(self, x):
return self.net(x)
We also define a Dataset class for MNIST.
class MNISTDataset(Dataset):
def __init__(self, features, targets, transform=None):
self.features = features
self.targets = targets
self.transform = transform
def __len__(self):
return self.features.shape[0]
def __getitem__(self, i):
X = self.features[i, :]
y = self.targets[i]
if self.transform is not None:
X = self.transform(X)
return X, y
Define a trainer for the neural network model. This will handle all loss and metric evaluation, as well as backpropagation.
class Engine:
"""Neural network trainer."""
def __init__(self, model, device, optimizer):
self.model = model
self.device = device
self.optimizer = optimizer
@staticmethod
def loss_fn(outputs, targets):
return nn.CrossEntropyLoss()(outputs, targets)
def train(self, data_loader):
"""Train model on one epoch. Return train loss."""
self.model.train()
loss = 0
for i, (data, targets) in enumerate(data_loader):
data = data.to(self.device).reshape(data.shape[0], -1).float()
targets = targets.to(self.device).long()
# Forward pass
outputs = self.model(data)
J = self.loss_fn(outputs, targets)
# Backward pass
self.optimizer.zero_grad()
J.backward()
self.optimizer.step()
# Cumulative loss
loss += (J.detach().item() - loss) / (i + 1)
return loss
def eval(self, data_loader):
"""Return validation loss and validation accuracy."""
self.model.eval()
num_correct = 0
num_samples = 0
loss = 0.0
with torch.no_grad():
for i, (data, targets) in enumerate(data_loader):
data = data.to(self.device).float()
targets = targets.to(self.device)
# Forward pass
data = data.reshape(data.shape[0], -1)
out = self.model(data)
J = self.loss_fn(out, targets)
_, preds = out.max(dim=1)
# Cumulative metrics
loss += (J.detach().item() - loss) / (i + 1)
num_correct += (preds == targets).sum().item()
num_samples += preds.shape[0]
acc = num_correct / num_samples
return loss, acc
Some config and setup prior to training. For our dataset, we use MNIST which we get from scikit-learn.
# Config
RANDOM_STATE = 42
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
EPOCHS = 100
PATIENCE = 5
INPUT_SIZE = 784
NUM_CLASSES = 10
# Fetch data
MNIST = fetch_openml("mnist_784")
X = MNIST['data'].reshape(-1, 28, 28)
y = MNIST['target'].astype(int)
# Create folds
cv = model_selection.StratifiedKFold(n_splits=5)
trn_, val_ = next(iter(cv.split(X=X, y=y)))
# Get train and valid data loaders
train_dataset = MNISTDataset(X[trn_, :], y[trn_], transform=transforms.ToTensor())
valid_dataset = MNISTDataset(X[val_, :], y[val_], transform=transforms.ToTensor())
Intermediate values¶
Finally, we set up the study instance and its objective function. Note that the search space is dynamically constructed depending on the number of layers (i.e. an earlier suggestion for a hyperparameter). During training, we perform early stopping on validation loss. If no new minimum val. loss is found after 5 epochs, then the minimum val. loss is returned as the objective 3.
Computing intermediate values allow us to prune unpromising trials to conserve resources. The default pruner in Optuna is optuna.pruners.MedianPruner which prunes a trial if its best intermediate result as of the current step (e.g. current best valid loss) is worse than the median of all intermediate results of previous trials at the current step. Hence, the best intermediate result of a pruned trial is less than the best intermediate result of 1/2 of the other trials as of that step. In our case, if the minimum val. loss does not improve too quickly, then the trial is pruned. Of course, the validation loss could descend rapidly at later steps, but the median pruner does not bet on this happening.
def define_model(trial):
# Optimize the # of layers, hidden units and dropout ratio in each layer.
n_layers = trial.suggest_int("n_layers", 1, 3)
out_features = []
drop_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
for i in range(n_layers):
out_features.append(trial.suggest_int("n_units_l{}".format(i), 4, 128))
return MLPClassifier(INPUT_SIZE, NUM_CLASSES, n_layers, out_features, drop_rate)
def objective(trial):
model = define_model(trial).to(DEVICE)
batch_size = trial.suggest_int('batch_size', 8, 512, log=True)
learning_rate = trial.suggest_loguniform('lr', 1e-5, 1e-1)
weight_decay = trial.suggest_float('weight_decay', 0.0, 0.5)
optimizer = optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=3)
engine = Engine(model, DEVICE, optimizer)
# Init. dataloaders
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=batch_size, shuffle=True)
# Run training
best_loss = np.inf
patience = PATIENCE
for epoch in tqdm(range(EPOCHS), total=EPOCHS, leave=False):
# Train and validation step
train_loss = engine.train(train_loader)
valid_loss, valid_acc = engine.eval(valid_loader)
# Reduce learning rate
if scheduler is not None:
scheduler.step(valid_loss)
# Early stopping
if valid_loss < best_loss:
best_loss = valid_loss
patience = PATIENCE
else:
patience -= 1
if patience == 0:
break
# Pruning unpromising trials
trial.report(valid_loss, step=epoch)
if trial.should_prune():
raise optuna.TrialPruned()
return best_loss
# Create and run optimization problem
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=60)
[I 2021-09-23 10:38:02,389] A new study created in memory with name: no-name-15109c65-2b02-4b60-a01f-68019444fb10
[I 2021-09-23 10:59:05,092] Trial 0 finished with value: 0.159926038460392 and parameters: {'n_layers': 3, 'dropout_rate': 0.4237878293578906, 'n_units_l0': 105, 'n_units_l1': 12, 'n_units_l2': 57, 'batch_size': 11, 'lr': 0.07838898783374042, 'weight_decay': 0.014310556165731292}. Best is trial 0 with value: 0.159926038460392.
[I 2021-09-23 10:59:53,080] Trial 1 finished with value: 2.3018700839246367 and parameters: {'n_layers': 1, 'dropout_rate': 0.4940082698360224, 'n_units_l0': 4, 'batch_size': 17, 'lr': 0.021487602914889235, 'weight_decay': 0.3995315957064116}. Best is trial 0 with value: 0.159926038460392.
[I 2021-09-23 11:01:36,372] Trial 2 finished with value: 0.8710345980283376 and parameters: {'n_layers': 1, 'dropout_rate': 0.2939541603566087, 'n_units_l0': 7, 'batch_size': 387, 'lr': 0.00010912886075846094, 'weight_decay': 0.017634974179537355}. Best is trial 0 with value: 0.159926038460392.
[I 2021-09-23 11:03:45,439] Trial 3 finished with value: 0.3005696608595654 and parameters: {'n_layers': 3, 'dropout_rate': 0.3961957910723303, 'n_units_l0': 80, 'n_units_l1': 74, 'n_units_l2': 36, 'batch_size': 96, 'lr': 0.0026373954379683884, 'weight_decay': 0.24822915712352522}. Best is trial 0 with value: 0.159926038460392.
[I 2021-09-23 11:08:09,224] Trial 4 finished with value: 0.08793183607278308 and parameters: {'n_layers': 2, 'dropout_rate': 0.35574022199146005, 'n_units_l0': 105, 'n_units_l1': 76, 'batch_size': 46, 'lr': 0.0005830250147792009, 'weight_decay': 0.00454269961512882}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:11:13,412] Trial 5 finished with value: 0.17019883443939346 and parameters: {'n_layers': 2, 'dropout_rate': 0.23772345591693692, 'n_units_l0': 68, 'n_units_l1': 118, 'batch_size': 31, 'lr': 0.0010257091866609097, 'weight_decay': 0.25520616268833335}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:11:15,334] Trial 6 pruned.
[I 2021-09-23 11:11:31,450] Trial 7 pruned.
[I 2021-09-23 11:11:33,638] Trial 8 pruned.
[I 2021-09-23 11:11:35,084] Trial 9 pruned.
[I 2021-09-23 11:12:06,172] Trial 10 pruned.
[I 2021-09-23 11:12:30,781] Trial 11 pruned.
[I 2021-09-23 11:12:41,090] Trial 12 pruned.
[I 2021-09-23 11:14:39,102] Trial 13 finished with value: 0.15866707545490213 and parameters: {'n_layers': 2, 'dropout_rate': 0.32028340139369776, 'n_units_l0': 46, 'n_units_l1': 84, 'batch_size': 54, 'lr': 0.00036889687117680296, 'weight_decay': 0.14755454483289712}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:14:42,890] Trial 14 pruned.
[I 2021-09-23 11:14:46,241] Trial 15 pruned.
[I 2021-09-23 11:14:51,457] Trial 16 pruned.
[I 2021-09-23 11:16:10,227] Trial 17 finished with value: 0.10545255993435412 and parameters: {'n_layers': 2, 'dropout_rate': 0.28433430180887476, 'n_units_l0': 86, 'n_units_l1': 93, 'batch_size': 173, 'lr': 0.0008782486158306226, 'weight_decay': 0.10255248054833463}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:17:34,185] Trial 18 finished with value: 0.09672328307022972 and parameters: {'n_layers': 2, 'dropout_rate': 0.2810583620463708, 'n_units_l0': 87, 'n_units_l1': 103, 'batch_size': 226, 'lr': 0.0013952324252238192, 'weight_decay': 0.07617183386981775}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:17:43,721] Trial 19 pruned.
[I 2021-09-23 11:17:44,968] Trial 20 pruned.
[I 2021-09-23 11:18:44,154] Trial 21 finished with value: 0.103310143109411 and parameters: {'n_layers': 2, 'dropout_rate': 0.28231541834744717, 'n_units_l0': 86, 'n_units_l1': 99, 'batch_size': 177, 'lr': 0.001441251731465695, 'weight_decay': 0.08705490602314442}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:20:01,498] Trial 22 finished with value: 0.09066066514976596 and parameters: {'n_layers': 2, 'dropout_rate': 0.2773342812914818, 'n_units_l0': 96, 'n_units_l1': 125, 'batch_size': 262, 'lr': 0.0013605459441012556, 'weight_decay': 0.05281219164907667}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:20:02,752] Trial 23 pruned.
[I 2021-09-23 11:20:07,399] Trial 24 pruned.
[I 2021-09-23 11:20:24,401] Trial 25 pruned.
[I 2021-09-23 11:20:25,692] Trial 26 pruned.
[I 2021-09-23 11:20:27,766] Trial 27 pruned.
[I 2021-09-23 11:20:28,949] Trial 28 pruned.
[I 2021-09-23 11:20:30,214] Trial 29 pruned.
[I 2021-09-23 11:20:37,530] Trial 30 pruned.
[I 2021-09-23 11:21:45,135] Trial 31 finished with value: 0.10255725085735322 and parameters: {'n_layers': 2, 'dropout_rate': 0.28213738835019486, 'n_units_l0': 81, 'n_units_l1': 100, 'batch_size': 188, 'lr': 0.0010823376577192374, 'weight_decay': 0.08337185022784563}. Best is trial 4 with value: 0.08793183607278308.
[I 2021-09-23 11:21:51,880] Trial 32 pruned.
[I 2021-09-23 11:22:17,823] Trial 33 pruned.
[I 2021-09-23 11:22:18,941] Trial 34 pruned.
[I 2021-09-23 11:23:57,770] Trial 35 finished with value: 0.07131263853050765 and parameters: {'n_layers': 1, 'dropout_rate': 0.25003371367463706, 'n_units_l0': 95, 'batch_size': 141, 'lr': 0.0005237534419879276, 'weight_decay': 0.07353638770877091}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:25:37,628] Trial 36 finished with value: 0.08455202862944294 and parameters: {'n_layers': 1, 'dropout_rate': 0.24280168011286699, 'n_units_l0': 95, 'batch_size': 122, 'lr': 0.0005010708623607415, 'weight_decay': 0.1824635322987805}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:27:02,240] Trial 37 finished with value: 0.08483004270364407 and parameters: {'n_layers': 1, 'dropout_rate': 0.22159664064331255, 'n_units_l0': 98, 'batch_size': 122, 'lr': 0.0004940711510589418, 'weight_decay': 0.1741633832571235}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:27:03,936] Trial 38 pruned.
[I 2021-09-23 11:28:40,523] Trial 39 finished with value: 0.09355619992511183 and parameters: {'n_layers': 1, 'dropout_rate': 0.2042784926760532, 'n_units_l0': 112, 'batch_size': 46, 'lr': 0.00022698429759967267, 'weight_decay': 0.2794574787467307}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:29:18,054] Trial 40 pruned.
[I 2021-09-23 11:29:49,351] Trial 41 pruned.
[I 2021-09-23 11:31:15,710] Trial 42 finished with value: 0.1020486264151859 and parameters: {'n_layers': 1, 'dropout_rate': 0.23164804977568673, 'n_units_l0': 93, 'batch_size': 77, 'lr': 0.000285671272172161, 'weight_decay': 0.32568189877238407}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:31:17,269] Trial 43 pruned.
[I 2021-09-23 11:31:34,596] Trial 44 pruned.
[I 2021-09-23 11:31:42,364] Trial 45 pruned.
[I 2021-09-23 11:31:43,935] Trial 46 pruned.
[I 2021-09-23 11:32:27,962] Trial 47 pruned.
[I 2021-09-23 11:32:35,345] Trial 48 pruned.
[I 2021-09-23 11:32:37,533] Trial 49 pruned.
[I 2021-09-23 11:32:42,969] Trial 50 pruned.
[I 2021-09-23 11:33:43,032] Trial 51 pruned.
[I 2021-09-23 11:34:08,690] Trial 52 pruned.
[I 2021-09-23 11:34:22,864] Trial 53 pruned.
[I 2021-09-23 11:36:40,729] Trial 54 finished with value: 0.09744334753924168 and parameters: {'n_layers': 1, 'dropout_rate': 0.21622462411934498, 'n_units_l0': 89, 'batch_size': 31, 'lr': 0.0001329248089359777, 'weight_decay': 0.2971222095134244}. Best is trial 35 with value: 0.07131263853050765.
[I 2021-09-23 11:36:43,995] Trial 55 pruned.
[I 2021-09-23 11:37:28,764] Trial 56 pruned.
[I 2021-09-23 11:40:07,780] Trial 57 finished with value: 0.0686554341496194 and parameters: {'n_layers': 1, 'dropout_rate': 0.23290628211542697, 'n_units_l0': 96, 'batch_size': 64, 'lr': 0.0006717603763568699, 'weight_decay': 0.04716562652949421}. Best is trial 57 with value: 0.0686554341496194.
[I 2021-09-23 11:40:09,440] Trial 58 pruned.
[I 2021-09-23 11:40:12,123] Trial 59 pruned.
from optuna.trial import TrialState
pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])
print("Study statistics: ")
print(" Number of finished trials:\t", len(study.trials))
print(" Number of pruned trials:\t", len(pruned_trials))
print(" Number of complete trials:\t", len(complete_trials))
print("\nBest trial:")
trial = study.best_trial
print(" Value: ", trial.value)
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
Study statistics:
Number of finished trials: 60
Number of pruned trials: 41
Number of complete trials: 19
Best trial:
Value: 0.0686554341496194
Params:
n_layers: 1
dropout_rate: 0.23290628211542697
n_units_l0: 96
batch_size: 64
lr: 0.0006717603763568699
weight_decay: 0.04716562652949421
Trials below either early stops (gradient descent loses momentum) or gets pruned (unlikely to improve even if gradient descent continues). Note that pruning starts at Trial 5. This can be tweaked in the n_startup_trials=5 parameter of the pruner. In this case, pruning is disabled until the 5 trials finish in the same study. This is so that the pruner obtains enough information about the behavior of the gradient descent optimizer before starting to prune.
plot_html(optuna.visualization.plot_intermediate_values(study))
plot_html(optuna.visualization.plot_optimization_history(study))
Hyperparameter interactions¶
We look at which combinations of hyperparameters work well from the parallel coordinate plot. Note that there is something weird going on here. For example, trials with n_layers=1 has coordinates in axes where they should have no values, e.g. n_units_l1 and n_units_l2. This is a known issue for parallel plots, e.g. #1809. Turns out, lines with dynamically constructed parameters with NaNs should be skipped by plotter. Moreover, trials with NaN values are excluded from the parameter importance computation which limits its usefulness.
plot_html(optuna.visualization.plot_parallel_coordinate(study))
study.trials_dataframe().head()
| number | value | datetime_start | datetime_complete | duration | params_batch_size | params_dropout_rate | params_lr | params_n_layers | params_n_units_l0 | params_n_units_l1 | params_n_units_l2 | params_weight_decay | state | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.159926 | 2021-09-23 10:38:02.391944 | 2021-09-23 10:59:05.091262 | 0 days 00:21:02.699318 | 11 | 0.423788 | 0.078389 | 3 | 105 | 12.0 | 57.0 | 0.014311 | COMPLETE |
| 1 | 1 | 2.301870 | 2021-09-23 10:59:05.096371 | 2021-09-23 10:59:53.079261 | 0 days 00:00:47.982890 | 17 | 0.494008 | 0.021488 | 1 | 4 | NaN | NaN | 0.399532 | COMPLETE |
| 2 | 2 | 0.871035 | 2021-09-23 10:59:53.084333 | 2021-09-23 11:01:36.371685 | 0 days 00:01:43.287352 | 387 | 0.293954 | 0.000109 | 1 | 7 | NaN | NaN | 0.017635 | COMPLETE |
| 3 | 3 | 0.300570 | 2021-09-23 11:01:36.374203 | 2021-09-23 11:03:45.438474 | 0 days 00:02:09.064271 | 96 | 0.396196 | 0.002637 | 3 | 80 | 74.0 | 36.0 | 0.248229 | COMPLETE |
| 4 | 4 | 0.087932 | 2021-09-23 11:03:45.440817 | 2021-09-23 11:08:09.224148 | 0 days 00:04:23.783331 | 46 | 0.355740 | 0.000583 | 2 | 105 | 76.0 | NaN | 0.004543 | COMPLETE |
study.trials_dataframe().query("state=='COMPLETE'").params_n_layers.value_counts()
1 9
2 8
3 2
Name: params_n_layers, dtype: int64
Instead, we can look at each subset of trials for different values of n_layers. The resulting trials have no NaN parameters since the paramaters are sampled after a value for n_layers has been suggested. Looks like n_layers=1 works best.
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# Isolate a study for each value of n_layers
studies = [optuna.create_study() for j in range(3)]
for j in range(3):
studies[j].add_trials([t for t in study.trials if t.params['n_layers'] == j+1])
fig = optuna.visualization.plot_parallel_coordinate(studies[j])
plot_html(fig)
[I 2021-09-23 11:45:13,690] A new study created in memory with name: no-name-b8d5c1f9-375a-4aa8-919e-4ef6fdb7492c
[I 2021-09-23 11:45:13,694] A new study created in memory with name: no-name-10bc04ca-72cd-494a-804e-99af840e0185
[I 2021-09-23 11:45:13,697] A new study created in memory with name: no-name-7f4d429f-e6b6-4624-a72f-8cede19884fa
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:7: ExperimentalWarning:
add_trials is experimental (supported from v2.5.0). The interface can change in the future.
/usr/local/lib/python3.7/dist-packages/optuna/study/study.py:969: ExperimentalWarning:
add_trial is experimental (supported from v2.0.0). The interface can change in the future.
From the following contour plot, we see that a low batch size is generally good, with high values of dropout, learning rate, and weight decay, and only a single hidden layer. From the above parallel plot, a hidden layer of size around 90 looks good.
fig = optuna.visualization.plot_contour(study, params=['batch_size', 'lr', 'n_layers', 'weight_decay', 'dropout_rate'])
fig.update_layout(autosize=False, width=1200, height=1200)
plot_html(fig)
Appendix: Hyperparameters of commonly used models¶
- 1
Like all applied machine learning solutions.
- 2
See Optuna dashboard which displays the same plots that are updated in real-time.
- 3
In practice, we save the best model parameters at this point.



